Node Types

Introduction to nodes

A node is a building block in a Towhee pipeline. Each node has its own function or transformations to process data. By combining each node, you can create directed acyclic graphs (DAGs) and build pipelines. If you are a Towhee beginner, it is highly recommended that you learn the section Create Your First Pipeline first before going through the programming guide.

Node types

Currently, Towhee supports nine types of nodes. Input and output nodes are used for pipeline input and output definition. The rest seven types of nodes, each with unique data transformations, are usually used for data processing and analytics. The following table lists the nine types of nodes in Towhee and their corresponding interface.

Node Type	Description
input(input_schema)	This node defines the input schema of a pipeline and is the beginning of a pipeline's definition. Note that a pipeline's input schema can not be empty. Refer to input API for more details.
output(output_schema)	This node defines the pipeline's output schema, and ends a pipeline definition. Once called, a pipeline instance will be created and returned. Refer to output API for more details.
map(input_schema, output_schema, func)	This node applies the given function `func` to each of its inputs and returns the transformed data. `map` returns one row for every row of input. Refer to map API for more details.
flat_map(input_schema, output_schema, func)	This node flattens the results after applying the function to every row of input, and returns the flattened data respectively.The returned data can have the same count or more number of rows compared with the input. This is one of the major differences between `flat_map` and `map`, where `map` always returns the same number of rows as input. Refer to flat_map API for more details.
filter(input_schema, output_schema, filter_columns, func)	This node applies the filter function `func` to the `filter_columns`.Refer to filter API for more details.
window(input_schema, output_schema, size, step, func)	This node batches the input rows into multiple rows based on the specified window `size` and `step`. Then it applies a function `func` to each of the windowed data, and returns the results - one row of results for each of the windows. Refer to window API for more details.
time_window(input_schema, output_schema, timestamp_col, size, step, func)	This node is used to batch rows that have a time sequence, for example, audio or video frames.`time_window` is similar to `window`, but the batching rule is applied based on a timestamp column (`timestamp_col`). `size` is the time interval of each window, and `step` determines how long a window moves from the previous one. Note that if `step` is less than `size`, the windows will overlap. Refer to time_window API for more details.
window_all(input_schema, output_schema, func)	This node batches all input rows into one window, and returns the result by applying a function `func` to the window. Refer to window_all API for more details.
concat(pipelines)	This node concats multiple pipelines' intermediate results, and groups all the pipelines into a bigger one. Refer to concat API for more details.

Introduction to nodes​

Node types​

Introduction to nodes

Node types